无监督的交叉模式医学图像适应旨在减轻不同成像方式之间的严重域间隙,而无需使用目标域标签。该活动的关键依赖于对齐源和目标域的分布。一种常见的尝试是强制两个域之间的全局对齐,但是,这忽略了致命的局部不平衡域间隙问题,即,一些具有较大域间隙的局部特征很难转移。最近,某些方法进行一致性,重点是地方区域,以提高模型学习的效率。尽管此操作可能会导致上下文中关键信息的缺陷。为了应对这一限制,我们提出了一种新的策略,以减轻医学图像的特征,即全球本地联盟的一致性,以减轻域间隙不平衡。具体而言,功能 - 触发样式转移模块首先合成类似目标的源包含图像,以减少全局域间隙。然后,集成了本地功能掩码,以通过优先考虑具有较大域间隙的判别特征来减少本地特征的“间隙”。全球和局部对齐的这种组合可以精确地将关键区域定位在分割目标中,同时保持整体语义一致性。我们进行了一系列具有两个跨模式适应任务的实验,i,e。心脏子结构和腹部多器官分割。实验结果表明,我们的方法在这两个任务中都达到了最新的性能。
translated by 谷歌翻译
域适应(DA)最近在医学影像社区提出了强烈的兴趣。虽然已经提出了大量DA技术进行了用于图像分割,但大多数这些技术已经在私有数据集或小公共可用数据集上验证。此外,这些数据集主要解决了单级问题。为了解决这些限制,与第24届医学图像计算和计算机辅助干预(Miccai 2021)结合第24届国际会议组织交叉模态域适应(Crossmoda)挑战。 Crossmoda是无监督跨型号DA的第一个大型和多级基准。挑战的目标是分割参与前庭施瓦新瘤(VS)的后续和治疗规划的两个关键脑结构:VS和Cochleas。目前,使用对比度增强的T1(CET1)MRI进行VS患者的诊断和监测。然而,使用诸如高分辨率T2(HRT2)MRI的非对比度序列越来越感兴趣。因此,我们创建了一个无人监督的跨模型分段基准。训练集提供注释CET1(n = 105)和未配对的非注释的HRT2(n = 105)。目的是在测试集中提供的HRT2上自动对HRT2进行单侧VS和双侧耳蜗分割(n = 137)。共有16支球队提交了评估阶段的算法。顶级履行团队达成的表现水平非常高(最佳中位数骰子 - vs:88.4%; Cochleas:85.7%)并接近完全监督(中位数骰子 - vs:92.5%;耳蜗:87.7%)。所有顶级执行方法都使用图像到图像转换方法将源域图像转换为伪目标域图像。然后使用这些生成的图像和为源图像提供的手动注释进行培训分割网络。
translated by 谷歌翻译
Participants in political discourse employ rhetorical strategies -- such as hedging, attributions, or denials -- to display varying degrees of belief commitments to claims proposed by themselves or others. Traditionally, political scientists have studied these epistemic phenomena through labor-intensive manual content analysis. We propose to help automate such work through epistemic stance prediction, drawn from research in computational semantics, to distinguish at the clausal level what is asserted, denied, or only ambivalently suggested by the author or other mentioned entities (belief holders). We first develop a simple RoBERTa-based model for multi-source stance predictions that outperforms more complex state-of-the-art modeling. Then we demonstrate its novel application to political science by conducting a large-scale analysis of the Mass Market Manifestos corpus of U.S. political opinion books, where we characterize trends in cited belief holders -- respected allies and opposed bogeymen -- across U.S. political ideologies.
translated by 谷歌翻译
While inferring common actor states (such as position or velocity) is an important and well-explored task of the perception system aboard a self-driving vehicle (SDV), it may not always provide sufficient information to the SDV. This is especially true in the case of active emergency vehicles (EVs), where light-based signals also need to be captured to provide a full context. We consider this problem and propose a sequential methodology for the detection of active EVs, using an off-the-shelf CNN model operating at a frame level and a downstream smoother that accounts for the temporal aspect of flashing EV lights. We also explore model improvements through data augmentation and training with additional hard samples.
translated by 谷歌翻译
A key feature of federated learning (FL) is to preserve the data privacy of end users. However, there still exist potential privacy leakage in exchanging gradients under FL. As a result, recent research often explores the differential privacy (DP) approaches to add noises to the computing results to address privacy concerns with low overheads, which however degrade the model performance. In this paper, we strike the balance of data privacy and efficiency by utilizing the pervasive social connections between users. Specifically, we propose SCFL, a novel Social-aware Clustered Federated Learning scheme, where mutually trusted individuals can freely form a social cluster and aggregate their raw model updates (e.g., gradients) inside each cluster before uploading to the cloud for global aggregation. By mixing model updates in a social group, adversaries can only eavesdrop the social-layer combined results, but not the privacy of individuals. We unfold the design of SCFL in three steps. \emph{i) Stable social cluster formation. Considering users' heterogeneous training samples and data distributions, we formulate the optimal social cluster formation problem as a federation game and devise a fair revenue allocation mechanism to resist free-riders. ii) Differentiated trust-privacy mapping}. For the clusters with low mutual trust, we design a customizable privacy preservation mechanism to adaptively sanitize participants' model updates depending on social trust degrees. iii) Distributed convergence}. A distributed two-sided matching algorithm is devised to attain an optimized disjoint partition with Nash-stable convergence. Experiments on Facebook network and MNIST/CIFAR-10 datasets validate that our SCFL can effectively enhance learning utility, improve user payoff, and enforce customizable privacy protection.
translated by 谷歌翻译
Transformer-based models have been widely demonstrated to be successful in computer vision tasks by modelling long-range dependencies and capturing global representations. However, they are often dominated by features of large patterns leading to the loss of local details (e.g., boundaries and small objects), which are critical in medical image segmentation. To alleviate this problem, we propose a Dual-Aggregation Transformer Network called DuAT, which is characterized by two innovative designs, namely, the Global-to-Local Spatial Aggregation (GLSA) and Selective Boundary Aggregation (SBA) modules. The GLSA has the ability to aggregate and represent both global and local spatial features, which are beneficial for locating large and small objects, respectively. The SBA module is used to aggregate the boundary characteristic from low-level features and semantic information from high-level features for better preserving boundary details and locating the re-calibration objects. Extensive experiments in six benchmark datasets demonstrate that our proposed model outperforms state-of-the-art methods in the segmentation of skin lesion images, and polyps in colonoscopy images. In addition, our approach is more robust than existing methods in various challenging situations such as small object segmentation and ambiguous object boundaries.
translated by 谷歌翻译
Users' involvement in creating and propagating news is a vital aspect of fake news detection in online social networks. Intuitively, credible users are more likely to share trustworthy news, while untrusted users have a higher probability of spreading untrustworthy news. In this paper, we construct a dual-layer graph (i.e., the news layer and the user layer) to extract multiple relations of news and users in social networks to derive rich information for detecting fake news. Based on the dual-layer graph, we propose a fake news detection model named Us-DeFake. It learns the propagation features of news in the news layer and the interaction features of users in the user layer. Through the inter-layer in the graph, Us-DeFake fuses the user signals that contain credibility information into the news features, to provide distinctive user-aware embeddings of news for fake news detection. The training process conducts on multiple dual-layer subgraphs obtained by a graph sampler to scale Us-DeFake in large scale social networks. Extensive experiments on real-world datasets illustrate the superiority of Us-DeFake which outperforms all baselines, and the users' credibility signals learned by interaction relation can notably improve the performance of our model.
translated by 谷歌翻译
Task-oriented dialogue systems often assist users with personal or confidential matters. For this reason, the developers of such a system are generally prohibited from observing actual usage. So how can they know where the system is failing and needs more training data or new functionality? In this work, we study ways in which realistic user utterances can be generated synthetically, to help increase the linguistic and functional coverage of the system, without compromising the privacy of actual users. To this end, we propose a two-stage Differentially Private (DP) generation method which first generates latent semantic parses, and then generates utterances based on the parses. Our proposed approach improves MAUVE by 3.8$\times$ and parse tree node-type overlap by 1.4$\times$ relative to current approaches for private synthetic data generation, improving both on fluency and semantic coverage. We further validate our approach on a realistic domain adaptation task of adding new functionality from private user data to a semantic parser, and show gains of 1.3$\times$ on its accuracy with the new feature.
translated by 谷歌翻译
We introduce INSTRUCTOR, a new method for computing text embeddings given task instructions: every text input is embedded together with instructions explaining the use case (e.g., task and domain descriptions). Unlike encoders from prior work that are more specialized, INSTRUCTOR is a single embedder that can generate text embeddings tailored to different downstream tasks and domains, without any further training. We first annotate instructions for 330 diverse tasks and train INSTRUCTOR on this multitask mixture with a contrastive loss. We evaluate INSTRUCTOR on 70 embedding evaluation tasks (66 of which are unseen during training), ranging from classification and information retrieval to semantic textual similarity and text generation evaluation. INSTRUCTOR, while having an order of magnitude fewer parameters than the previous best model, achieves state-of-the-art performance, with an average improvement of 3.4% compared to the previous best results on the 70 diverse datasets. Our analysis suggests that INSTRUCTOR is robust to changes in instructions, and that instruction finetuning mitigates the challenge of training a single model on diverse datasets.
translated by 谷歌翻译
A key missing ability of current language models (LMs) is grounding to real-world environments. Most existing work for grounded language understanding uses LMs to directly generate plans that can be executed in the environment to achieve the desired effects. It casts the burden of ensuring grammaticality, faithfulness, and controllability all on the LMs. We propose Pangu, a generic framework for grounded language understanding that capitalizes on the discriminative ability of LMs instead of their generative ability. Pangu consists of a symbolic agent and a neural LM working in a concerted fashion: the agent explores the environment to incrementally construct valid candidate plans, and the LM evaluates the plausibility of the candidate plans to guide the search process. A case study on the challenging problem of knowledge base question answering (KBQA), which features a massive environment, demonstrates the remarkable effectiveness and flexibility of Pangu: A BERT-base LM is sufficient for achieving a new state of the art on standard KBQA datasets, and larger LMs further improve the performance by a large margin. Pangu also enables, for the first time, effective few-shot in-context learning for KBQA with large LMs such as Codex.
translated by 谷歌翻译